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Introduction 


This report describes results of benchmark tests on Steger, a 250 MHz Origin 2000 system with R1 OK 
processors currently installed at the NASA Ames National Advanced Supercomputing (NAS) facility. 
F^compMison^uiposes, the tests were also run on Lomax, a 400 MHz Origin 2000 with R12K 

processors. 

The BT LU, and SP application benchmarks in the NAS Parallel Benchmark Suite and the kernel 
h^nrhmark FT were chosen to measure system performance. Having been written to measure 
performance on Computational Fluid Dynamics applications, these benchmarks are assumed appropna e 
fo represent the NAS workload. Since the NAS runs both message passing (MPI) and shared- memory 
compiler directive type codes, both MPI and OpenMP versions of the benchmarks were used The MPI 
versions used were the latest official release of the NAS Parallel Benchmarks version ^. The OpenMP 
versions used were PBN3b2, a beta version that is in the process of being released A rm results 
3b2 are technically different benchmarks, and NPB results are not directly compara 


Links to descriptions of the benchmarks themselves. 


NPB description 
PBN description 

All runs were Class C, and compiled with 64-bit addressing. The MPI programs were compiled with the 
02 compiler flag described as extensive optimization by the SGI compiler man pages. The P^ 1 
mns wem compifed with the -03 compiler flag, the highest level of < optii 
flags were used because the MPI BT Class C benchmark ran faster when compiled with > the ; -O 
Affer running the MPI benchmarks it was discovered that this was not the case foraH benchmarks, so 
the 03 flag normally the faster option, was used for the Open MP benchmarks ^^V the ^J PI 
benc^m^ks' we ^mn compiled with the -03 flag on Steger, the timings were within 5% of the times 
obtained compiling with -02, so compiler flag choice did not significantly affect the results. 

Summary of Results 



The mean, median, and standard deviation of the timings and MOPS counts for each benchmark are 
presented below in Tables 3, 4, 5 and 6. To get a reasonable sample, seven runs of each benchmark were 
done on each machine. 

All runs were done on a machine controlled by a custom PBS scheduler written by Ed Hook, of CSC 
Corp working at the NASA NAS division, which uses cpusets and an awareness of machine topology to 
insure memory allocation and execution on physically contiguous nodes. Because the machines were 
space shared, not time shared, interference from other jobs was minimized. The lack of run time 
variation among benchmark runs supports this hypothesis. 

Table 3 - Steger - 250 MHz R10K - MPI results 


BT Class^l 

Seconds 

MOPS 

Median • 

2922.24 

980.85 

Mean | 

2925.58 

979.75 

Std. Dev. 

120.77 

4.77 

FT Class C 

Seconds 

MOPS 

Median 

623.53 

635.72 

Mean 

631.09 

628.81 

Std. Dev. 

22.90 

20.89 

|LU Class C 

Seconds 

MOPS 

Median 

1266.25 

1610.26 

Mean 

1266.93 

1609.40 

Std. Dev. 

2.38 

3.02 

SP Class C 

Seconds 

MOPS 

Median 

1894.96 

765.24 

Mean 

1894.41 

765.47 

Std. Dev. 

4.23 

1.71 


Hardware info: 

IRIX64 steger 6.5 10120851 IP27 
256 250 MHZ IP27 Processors 
CPU: MIPS R 10000 Processor Chip Revision: 3.4 
FPU: MIPS R 100 10 Floating Point Chip Revision: 3.4 
Main memory size: 65536 Mbytes 
Instruction cache size: 32 Kbytes 
Data cache size: 32 Kbytes 

Secondary unified instruction/data cache size: 4 Mbytes 

Table 4 - Lomax - 400 MHz R12K Origin 2000 MPI Results 





Hardware info: 

IRIX64 steger 6.5 10120851 IP27 
256 250 MHZ IP27 Processors 
CPU: MIPS R 10000 Processor Chip Revision: 3.4 
FPU: MIPS R 10010 Floating Point Chip Revision: 3.4 
Main memory size: 65536 Mbytes 
Instruction cache size: 32 Kbytes 
Data cache size: 32 Kbytes 

Secondary unified instruction/data cache size: 4 Mbytes 


Table 6 - Lomax - 400 MHz R12K OpenMP 










































